Hands-on Ex 9A

Processing & Visualising Flow Data

Author

Stephen Tay

Published

November 2, 2024

Modified

November 3, 2024

The hands-on exercise on Modelling Geographical Accessibility is in Hands-on Ex 10

1. Overview

Spatial interaction (SI) refers to the movement of people, goods, or information between locations, covering activities like freight shipments, energy flows, global trade, commutes, and pedestrian traffic. Defined broadly, SI is the flow of individuals, commodities, capital, and information across space, often influenced by decision-making (Fotheringham & O’Kelly, 1989). The key principle of SI is that individuals or entities weigh the benefits of interaction (such as commuting to work or migrating) against the costs of overcoming spatial distance (Fischer, 2001), a core concept for understanding spatial behavior.

Each interaction forms an origin/destination (OD) pair, which can be organized in an OD matrix, mapping origins to destinations. In this exercise, we will build an OD matrix using Passenger Volume by Origin Destination Bus Stops data from LTA DataMall to explore SI in practical applications.

The following R packages are used, with stplanr being the key package used for transport planning and modeling. It provides tools to download and clean transport data, map “desire lines” (the direct routes people prefer to take between locations), assign routes for travel (including options for cycling routes), calculate details about each route, like direction and traffic flow, and analyze areas reachable within specific travel times.

pacman::p_load(tmap, sf, DT, stplanr, tidyverse)

2. Importing & Preparing Data

We will import and work with three datasets:

  • Bus Commuters by Origin/Destination data from LTA DataMall
  • Bus Stop Locations: Data on bus stop locations as of the last quarter of 2022.
  • MPSZ-2019: Sub-zone boundary data from the URA Master Plan 2019.
odbus <- read_csv("data/aspatial/origin_destination_bus_202210.csv")
Rows: 5122925 Columns: 7
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (3): YEAR_MONTH, DAY_TYPE, PT_TYPE
dbl (4): TIME_PER_HOUR, ORIGIN_PT_CODE, DESTINATION_PT_CODE, TOTAL_TRIPS

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
glimpse(odbus)
Rows: 5,122,925
Columns: 7
$ YEAR_MONTH          <chr> "2022-10", "2022-10", "2022-10", "2022-10", "2022-…
$ DAY_TYPE            <chr> "WEEKDAY", "WEEKENDS/HOLIDAY", "WEEKENDS/HOLIDAY",…
$ TIME_PER_HOUR       <dbl> 10, 10, 7, 11, 16, 16, 20, 7, 7, 11, 11, 8, 11, 11…
$ PT_TYPE             <chr> "BUS", "BUS", "BUS", "BUS", "BUS", "BUS", "BUS", "…
$ ORIGIN_PT_CODE      <dbl> 65239, 65239, 23519, 52509, 54349, 54349, 43371, 8…
$ DESTINATION_PT_CODE <dbl> 65159, 65159, 23311, 42041, 53241, 53241, 14139, 9…
$ TOTAL_TRIPS         <dbl> 2, 1, 2, 1, 1, 4, 1, 3, 1, 5, 2, 5, 15, 40, 1, 1, …

Since ORIGIN_PT_CODE and DESTINATION_PT_CODE are in numeric data type, as.factor() function is used to convert it to character/factor data type.

odbus$ORIGIN_PT_CODE <- as.factor(odbus$ORIGIN_PT_CODE)
odbus$DESTINATION_PT_CODE <- as.factor(odbus$DESTINATION_PT_CODE) 

For this exercise, we will focus on weekday commuting flows between 6:00 and 9:00 am. The eventual dataset is the total number of trips between origin and destination bus stops.

odbus6_9 <- odbus %>%
  filter(DAY_TYPE == "WEEKDAY") %>%
  filter(TIME_PER_HOUR >= 6 &
           TIME_PER_HOUR <= 9) %>%
  group_by(ORIGIN_PT_CODE,
           DESTINATION_PT_CODE) %>%
  summarise(TRIPS = sum(TOTAL_TRIPS))
`summarise()` has grouped output by 'ORIGIN_PT_CODE'. You can override using
the `.groups` argument.
datatable(odbus6_9)
Warning in instance$preRenderHook(instance): It seems your data is too big for
client-side DataTables. You may consider server-side processing:
https://rstudio.github.io/DT/server.html

We use st_transform() to transform the projection to CRS 3414.

busstop <- st_read(dsn = "data/geospatial",
                   layer = "BusStop") %>%
  st_transform(crs = 3414)
Reading layer `BusStop' from data source 
  `/Users/stephentay/stephentay/ISSS626-Geospatial-Analytics/Hands-on_Ex/Hands-on_Ex09/data/geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 5159 features and 3 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 3970.122 ymin: 26482.1 xmax: 48284.56 ymax: 52983.82
Projected CRS: SVY21

We use st_transform() to transform the projection to CRS 3414.

mpsz <- st_read(dsn = "data/geospatial",
                layer = "MPSZ-2019") %>%
  st_transform(crs = 3414)
Reading layer `MPSZ-2019' from data source 
  `/Users/stephentay/stephentay/ISSS626-Geospatial-Analytics/Hands-on_Ex/Hands-on_Ex09/data/geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 332 features and 6 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 103.6057 ymin: 1.158699 xmax: 104.0885 ymax: 1.470775
Geodetic CRS:  WGS 84
mpsz
Simple feature collection with 332 features and 6 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
Projected CRS: SVY21 / Singapore TM
First 10 features:
                 SUBZONE_N SUBZONE_C       PLN_AREA_N PLN_AREA_C       REGION_N
1              MARINA EAST    MESZ01      MARINA EAST         ME CENTRAL REGION
2         INSTITUTION HILL    RVSZ05     RIVER VALLEY         RV CENTRAL REGION
3           ROBERTSON QUAY    SRSZ01  SINGAPORE RIVER         SR CENTRAL REGION
4  JURONG ISLAND AND BUKOM    WISZ01  WESTERN ISLANDS         WI    WEST REGION
5             FORT CANNING    MUSZ02           MUSEUM         MU CENTRAL REGION
6         MARINA EAST (MP)    MPSZ05    MARINE PARADE         MP CENTRAL REGION
7                   SUDONG    WISZ03  WESTERN ISLANDS         WI    WEST REGION
8                  SEMAKAU    WISZ02  WESTERN ISLANDS         WI    WEST REGION
9           SOUTHERN GROUP    SISZ02 SOUTHERN ISLANDS         SI CENTRAL REGION
10                 SENTOSA    SISZ01 SOUTHERN ISLANDS         SI CENTRAL REGION
   REGION_C                       geometry
1        CR MULTIPOLYGON (((33222.98 29...
2        CR MULTIPOLYGON (((28481.45 30...
3        CR MULTIPOLYGON (((28087.34 30...
4        WR MULTIPOLYGON (((14557.7 304...
5        CR MULTIPOLYGON (((29542.53 31...
6        CR MULTIPOLYGON (((35279.55 30...
7        WR MULTIPOLYGON (((15772.59 21...
8        WR MULTIPOLYGON (((19843.41 21...
9        CR MULTIPOLYGON (((30870.53 22...
10       CR MULTIPOLYGON (((26879.04 26...

As the flow analysis is at the subzonal level, we will need to retrieve the planning subzone code (SUBZONE_C) for each bus stop. The st_intersection() function performs a point-in-polygon overlay, resulting in a point sf dataframe that matches each bus stop to its respective subzone. Next, select() is used to retain only BUS_STOP_N and SUBZONE_C in the busstop_mpsz data frame.

Note that five bus stops are excluded from the result as they fall outside Singapore’s boundary.

busstop_mpsz <- st_intersection(busstop, mpsz) %>%
  select(BUS_STOP_N, SUBZONE_C) %>%
  st_drop_geometry()
Warning: attribute variables are assumed to be spatially constant throughout
all geometries

The code below found that there were duplicated bus stop records.

duplicate <- busstop_mpsz %>%
  group_by_all() %>%
  filter(n()>1) %>%
  ungroup()
duplicate
# A tibble: 24 × 2
   BUS_STOP_N SUBZONE_C
   <chr>      <chr>    
 1 82221      GLSZ05   
 2 82221      GLSZ05   
 3 22501      JWSZ09   
 4 22501      JWSZ09   
 5 96319      TMSZ05   
 6 96319      TMSZ05   
 7 43709      BKSZ07   
 8 43709      BKSZ07   
 9 68091      SLSZ04   
10 68091      SLSZ04   
# ℹ 14 more rows

We use unique() to de-duplicate the records.

busstop_mpsz <- unique(busstop_mpsz)
datatable(busstop_mpsz)

Next, we append the planning subzone code from busstop_mpsz dataframe to the origin bus stops from odbus6_9 dataframe.

od_data <- left_join(odbus6_9 , busstop_mpsz,
            by = c("ORIGIN_PT_CODE" = "BUS_STOP_N")) %>%
  rename(ORIGIN_BS = ORIGIN_PT_CODE,
         ORIGIN_SZ = SUBZONE_C,
         DESTIN_BS = DESTINATION_PT_CODE)
Warning in left_join(odbus6_9, busstop_mpsz, by = c(ORIGIN_PT_CODE = "BUS_STOP_N")): Detected an unexpected many-to-many relationship between `x` and `y`.
ℹ Row 103964 of `x` matches multiple rows in `y`.
ℹ Row 161 of `y` matches multiple rows in `x`.
ℹ If a many-to-many relationship is expected, set `relationship =
  "many-to-many"` to silence this warning.

We can use the code below to check for duplicated records.

duplicate <- od_data %>%
  group_by_all() %>%
  filter(n()>1) %>%
  ungroup()
duplicate
# A tibble: 0 × 4
# ℹ 4 variables: ORIGIN_BS <chr>, DESTIN_BS <fct>, TRIPS <dbl>, ORIGIN_SZ <chr>

If there are duplicated records, we use unique() to de-duplicate them.

od_data <- unique(od_data)

Now, we append the planning subzone code from busstop_mpsz dataframe to the destination bus stops from odbus6_9 dataframe.

od_data <- left_join(od_data , busstop_mpsz,
            by = c("DESTIN_BS" = "BUS_STOP_N")) 
Warning in left_join(od_data, busstop_mpsz, by = c(DESTIN_BS = "BUS_STOP_N")): Detected an unexpected many-to-many relationship between `x` and `y`.
ℹ Row 11263 of `x` matches multiple rows in `y`.
ℹ Row 1379 of `y` matches multiple rows in `x`.
ℹ If a many-to-many relationship is expected, set `relationship =
  "many-to-many"` to silence this warning.

We can use the code below to check for duplicated records.

duplicate <- od_data %>%
  group_by_all() %>%
  filter(n()>1) %>%
  ungroup()
duplicate
# A tibble: 0 × 5
# ℹ 5 variables: ORIGIN_BS <chr>, DESTIN_BS <chr>, TRIPS <dbl>,
#   ORIGIN_SZ <chr>, SUBZONE_C <chr>

If there are duplicated records, we use unique() to de-duplicate them.

od_data <- unique(od_data)
glimpse(od_data)
Rows: 223,468
Columns: 5
Groups: ORIGIN_BS [4,995]
$ ORIGIN_BS <chr> "1012", "1012", "1012", "1012", "1012", "1012", "1012", "101…
$ DESTIN_BS <chr> "1112", "1113", "1121", "1211", "1311", "7371", "60011", "60…
$ TRIPS     <dbl> 168, 88, 77, 65, 172, 8, 26, 1, 26, 21, 2, 8, 1, 10, 11, 3, …
$ ORIGIN_SZ <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ SUBZONE_C <chr> NA, NA, NA, NA, NA, NA, "KLSZ08", "KLSZ08", "KLSZ02", "KLSZ0…

In the final step, we remove rows with null values in either the origin or destination subzone, then aggregate the total number of trips for each origin-destination subzone pair.

od_data <- od_data %>%
  rename(DESTIN_SZ = SUBZONE_C) %>%
  drop_na() %>%
  group_by(ORIGIN_SZ, DESTIN_SZ) %>%
  summarise(MORNING_PEAK = sum(TRIPS))
`summarise()` has grouped output by 'ORIGIN_SZ'. You can override using the
`.groups` argument.
glimpse(od_data)
Rows: 14,734
Columns: 3
Groups: ORIGIN_SZ [279]
$ ORIGIN_SZ    <chr> "AMSZ01", "AMSZ01", "AMSZ01", "AMSZ01", "AMSZ01", "AMSZ01…
$ DESTIN_SZ    <chr> "AMSZ01", "AMSZ02", "AMSZ03", "AMSZ04", "AMSZ05", "AMSZ06…
$ MORNING_PEAK <dbl> 1998, 8289, 8971, 2252, 6136, 2148, 1620, 1925, 1773, 63,…

3. Visualising Spatial Interaction

In this section, we will visualise “desire lines” (the routes people prefer to take between locations) using the stplanr package.

3.1 Remove intra-zonal flows

This study excludes intra-zonal flows, so they are removed from the analysis.

od_data_fij <- od_data[od_data$ORIGIN_SZ != od_data$DESTIN_SZ,]

3.2 Create inter-zonal desire lines

We use the od2line() function from the stplanr package to create inter-zonal desire lines.

flowLine <- od2line(flow = od_data_fij, 
                    zones = mpsz,
                    zone_code = "SUBZONE_C")
Creating centroids representing desire line start and end points.

3.3 Visualise desire lines

The map below shows inter-zonal bus commuter flows on weekdays between 6:00 and 9:00 am.

tm_shape(mpsz) +
  tm_polygons() +
flowLine %>%  
tm_shape() +
  tm_lines(lwd = "MORNING_PEAK",
           style = "quantile",
           scale = c(0.1, 1, 3, 5, 7, 10),
           n = 6,
           alpha = 0.5)
Warning in g$scale * (x/maxW): longer object length is not a multiple of
shorter object length

When inter-zonal bus commuter flows are complex or highly skewed, it can be effective to focus on selected flows, such as those with values greater than or equal to 5,000, as shown below.

tm_shape(mpsz) +
  tm_polygons() +
flowLine %>%  
  filter(MORNING_PEAK >= 5000) %>%
tm_shape() +
  tm_lines(lwd = "MORNING_PEAK",
           style = "quantile",
           scale = c(0.1, 1, 3, 5, 7, 10),
           n = 6,
           alpha = 0.5)
Warning in g$scale * (x/maxW): longer object length is not a multiple of
shorter object length